17 research outputs found

    Programming and parallelising applications for distributed infrastructures

    Get PDF
    The last decade has witnessed unprecedented changes in parallel and distributed infrastructures. Due to the diminished gains in processor performance from increasing clock frequency, manufacturers have moved from uniprocessor architectures to multicores; as a result, clusters of computers have incorporated such new CPU designs. Furthermore, the ever-growing need of scienti c applications for computing and storage capabilities has motivated the appearance of grids: geographically-distributed, multi-domain infrastructures based on sharing of resources to accomplish large and complex tasks. More recently, clouds have emerged by combining virtualisation technologies, service-orientation and business models to deliver IT resources on demand over the Internet. The size and complexity of these new infrastructures poses a challenge for programmers to exploit them. On the one hand, some of the di culties are inherent to concurrent and distributed programming themselves, e.g. dealing with thread creation and synchronisation, messaging, data partitioning and transfer, etc. On the other hand, other issues are related to the singularities of each scenario, like the heterogeneity of Grid middleware and resources or the risk of vendor lock-in when writing an application for a particular Cloud provider. In the face of such a challenge, programming productivity - understood as a tradeo between programmability and performance - has become crucial for software developers. There is a strong need for high-productivity programming models and languages, which should provide simple means for writing parallel and distributed applications that can run on current infrastructures without sacri cing performance. In that sense, this thesis contributes with Java StarSs, a programming model and runtime system for developing and parallelising Java applications on distributed infrastructures. The model has two key features: first, the user programs in a fully-sequential standard-Java fashion - no parallel construct, API call or pragma must be included in the application code; second, it is completely infrastructure-unaware, i.e. programs do not contain any details about deployment or resource management, so that the same application can run in di erent infrastructures with no changes. The only requirement for the user is to select the application tasks, which are the model's unit of parallelism. Tasks can be either regular Java methods or web service operations, and they can handle any data type supported by the Java language, namely les, objects, arrays and primitives. For the sake of simplicity of the model, Java StarSs shifts the burden of parallelisation from the programmer to the runtime system. The runtime is responsible from modifying the original application to make it create asynchronous tasks and synchronise data accesses from the main program. Moreover, the implicit inter-task concurrency is automatically found as the application executes, thanks to a data dependency detection mechanism that integrates all the Java data types. This thesis provides a fairly comprehensive evaluation of Java StarSs on three di erent distributed scenarios: Grid, Cluster and Cloud. For each of them, a runtime system was designed and implemented to exploit their particular characteristics as well as to address their issues, while keeping the infrastructure unawareness of the programming model. The evaluation compares Java StarSs against state-of-the-art solutions, both in terms of programmability and performance, and demonstrates how the model can bring remarkable productivity to programmers of parallel distributed applications

    A caching mechanism to exploit object store speed in High Energy Physics analysis

    Full text link
    [EN] Data analysis workflows in High Energy Physics (HEP) read data written in the ROOT columnar format. Such data has traditionally been stored in files that are often read via the network from remote storage facilities, which represents a performance penalty especially for data processing workflows that are I/O bound. To address that issue, this paper presents a new caching mechanism, implemented in the I/O subsystem of ROOT, which is independent of the storage backend used to write the dataset. Notably, it can be used to leverage the speed of high-bandwidth, low-latency object stores. The performance of this caching approach is evaluated by running a real physics analysis on an Intel DAOS cluster, both on a single node and distributed on multiple nodes.This work benefited from the support of the CERN Strategic R&D Programme on Technologies for Future Experiments [1] and from grant PID2020-113656RB-C22 funded by Ministerio de Ciencia e Innovacion MCIN/AEI/10.13039/501100011033. The hardware used to perform the experimental evaluation involving DAOS (HPE Delphi cluster described in Sect. 5.2) was made available thanks to a collaboration agreement with Hewlett-Packard Enterprise (HPE) and Intel. User access to the Virgo cluster at the GSI institute was given for the purpose of running the benchmarks using the Lustre filesystem.Padulano, VE.; Tejedor Saavedra, E.; Alonso-Jordá, P.; López Gómez, J.; Blomer, J. (2022). A caching mechanism to exploit object store speed in High Energy Physics analysis. Cluster Computing. 1-16. https://doi.org/10.1007/s10586-022-03757-211

    Prototyping a ROOT-based distributed analysis workflow for HL-LHC: the CMS use case

    Full text link
    The challenges expected for the next era of the Large Hadron Collider (LHC), both in terms of storage and computing resources, provide LHC experiments with a strong motivation for evaluating ways of rethinking their computing models at many levels. Great efforts have been put into optimizing the computing resource utilization for the data analysis, which leads both to lower hardware requirements and faster turnaround for physics analyses. In this scenario, the Compact Muon Solenoid (CMS) collaboration is involved in several activities aimed at benchmarking different solutions for running High Energy Physics (HEP) analysis workflows. A promising solution is evolving software towards more user-friendly approaches featuring a declarative programming model and interactive workflows. The computing infrastructure should keep up with this trend by offering on the one side modern interfaces, and on the other side hiding the complexity of the underlying environment, while efficiently leveraging the already deployed grid infrastructure and scaling toward opportunistic resources like public cloud or HPC centers. This article presents the first example of using the ROOT RDataFrame technology to exploit such next-generation approaches for a production-grade CMS physics analysis. A new analysis facility is created to offer users a modern interactive web interface based on JupyterLab that can leverage HTCondor-based grid resources on different geographical sites. The physics analysis is converted from a legacy iterative approach to the modern declarative approach offered by RDataFrame and distributed over multiple computing nodes. The new scenario offers not only an overall improved programming experience, but also an order of magnitude speedup increase with respect to the previous approach

    XtreemOS application execution management: a scalable approach

    Get PDF
    Designing a job management system for the Grid is a non-trivial task. While a complex middleware can give a lot of features, it often implies sacrificing performance. Such performance loss is especially noticeable for small jobs. A Job Manager’s design also affects the capabilities of the monitoring system. We believe that monitoring a job or asking for a job status should be fast and easy, like doing a simple ’ps’. In this paper, we present the job management of XtreemOS - a Linux-based operating system to support Virtual Organizations for Grid. This management is performed inside the Application Execution Manager (AEM). We evaluate its performance using only one job manager plus the built-in monitoring infrastructure. Furthermore, we present a set of real-world applications using AEM and its features. In XtreemOS we avoid reinventing the wheel and use the Linux paradigm as an abstraction.Peer ReviewedPostprint (published version

    Software Challenges For HL-LHC Data Analysis

    Full text link
    The high energy physics community is discussing where investment is needed to prepare software for the HL-LHC and its unprecedented challenges. The ROOT project is one of the central software players in high energy physics since decades. From its experience and expectations, the ROOT team has distilled a comprehensive set of areas that should see research and development in the context of data analysis software, for making best use of HL-LHC's physics potential. This work shows what these areas could be, why the ROOT team believes investing in them is needed, which gains are expected, and where related work is ongoing. It can serve as an indication for future research proposals and cooperations

    Web-Based Analysis at CERN

    No full text
    It is often normal for scientists to use IT services every day without appreciating the details of their architecture and operational aspects. Symmetrically, often, IT experts, while providing reliable services to the scientific community, do not have the opportunity to dive into the usage patterns of their infrastructures. These two lectures aim to bridge the gap between service users and service providers considering a particular service of the CERN portfolio, SWAN, which providing an interface for web based data analysis, federates several other production CERN services. The first lecture is dedicated to the general description of the SWAN service, its architecture and components. Concepts, which are generic to any service, will be stressed and distilled in form of general knowledge applicable in areas also not related to SWAN. Special attention is dedicated to the way in which it is packaged in containerized form and how this impacts its operation. Basic concepts of synchronized and shared storage; software distribution, interactive and mass processing are reviewed also through concrete examples collected during the first three years of the service. The second lecture is oriented towards interactive web based data analysis. The attendees are exposed to the basic principles which lead to a statistical interpretation of a dataset, focusing on examples coming from the collider physics and accelerator physics sectors as well as more traditional big data analysis related to IT infrastructure monitoring. Concepts such as identification of datasets, their cleaning and their effective processing are explored according to the state of the art technology trends. Academic_Training Feb: https://cern.zoom.us/j/63644909754?pwd=NVArRUNyKzl1UnhrQytIaDVFNWZnZz0

    Web-Based Analysis at CERN

    No full text
    It is often normal for scientists to use IT services every day without appreciating the details of their architecture and operational aspects. Symmetrically, often, IT experts, while providing reliable services to the scientific community, do not have the opportunity to dive into the usage patterns of their infrastructures. These two lectures aim to bridge the gap between service users and service providers considering a particular service of the CERN portfolio, SWAN, which providing an interface for web based data analysis, federates several other production CERN services. The first lecture is dedicated to the general description of the SWAN service, its architecture and components. Concepts, which are generic to any service, will be stressed and distilled in form of general knowledge applicable in areas also not related to SWAN. Special attention is dedicated to the way in which it is packaged in containerized form and how this impacts its operation. Basic concepts of synchronized and shared storage; software distribution, interactive and mass processing are reviewed also through concrete examples collected during the first three years of the service. The second lecture is oriented towards interactive web based data analysis. The attendees are exposed to the basic principles which lead to a statistical interpretation of a dataset, focusing on examples coming from the collider physics and accelerator physics sectors as well as more traditional big data analysis related to IT infrastructure monitoring. Concepts such as identification of datasets, their cleaning and their effective processing are explored according to the state of the art technology trends. Academic_Training Feb: https://cern.zoom.us/j/63644909754?pwd=NVArRUNyKzl1UnhrQytIaDVFNWZnZz0

    Fine-grained data caching approaches to speedup a distributed RDataFrame analysis

    No full text
    Thanks to its RDataFrame interface, ROOT now supports the execution of the same physics analysis code both on a single machine and on a cluster of distributed resources. In the latter scenario, it is common to read the input ROOT datasets over the network from remote storage systems, which often increases the time it takes for physicists to obtain their results. Storing the remote files much closer to where the computations will run can bring latency and execution time down. Such a solution can be improved further by caching only the actual portion of the dataset that will be processed on each machine in the cluster, reusing it in subsequent executions on the same input data. This paper shows the benefits of applying different means of caching input data in a distributed ROOT RDataFrame analysis. Two such mechanisms will be applied to this kind of workflow with different configurations, namely caching on the same nodes that process data or caching on a separate server

    Distributed data analysis with ROOT RDataFrame

    Get PDF
    Widespread distributed processing of big datasets has been around for more than a decade now thanks to Hadoop, but only recently higher-level abstractions have been proposed for programmers to easily operate on those datasets, e.g. Spark. ROOT has joined that trend with its RDataFrame tool for declarative analysis, which currently supports local multi-threaded parallelisation. However, RDataFrame’s programming model is general enough to accommodate multiple implementations or backends: users could write their code once and execute it as-is locally or distributedly, just by selecting the corresponding backend. This abstract introduces PyRDF, a new python library developed on top of RDataFrame to seamlessly switch from local to distributed environments with no changes in the application code. In addition, PyRDF has been integrated with a service for web-based analysis, SWAN, where users can dynamically plug in new resources, as well as write, execute, monitor and debug distributed applications via an intuitive interface

    Distributed data analysis with ROOT RDataFrame

    No full text
    Widespread distributed processing of big datasets has been around for more than a decade now thanks to Hadoop, but only recently higher-level abstractions have been proposed for programmers to easily operate on those datasets, e.g. Spark. ROOT has joined that trend with its RDataFrame tool for declarative analysis, which currently supports local multi-threaded parallelisation. However, RDataFrame’s programming model is general enough to accommodate multiple implementations or backends: users could write their code once and execute it as-is locally or distributedly, just by selecting the corresponding backend. This abstract introduces PyRDF, a new python library developed on top of RDataFrame to seamlessly switch from local to distributed environments with no changes in the application code. In addition, PyRDF has been integrated with a service for web-based analysis, SWAN, where users can dynamically plug in new resources, as well as write, execute, monitor and debug distributed applications via an intuitive interface
    corecore